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ABSTRACT 

The Flesch readability index yields meaningful 
information about the responses of readers to tests. Because the 
formula is so simple, a group of English teachers wrote a program in 
BASIC that would count some obvious surface features of a text and 
calculate Flesch scores. Among the programing problems encountered 
were counting words (taking into account numbers, acronyms, and 
abbreviations) and counting the number of syllables in a word 
(English has no regular rules for reliably dividing words into 
syllables). Using only this readability program will not ensure 
improved comprehension because the- act of reading and comprehending 
involves so many interrelated factors. However, the program can be 
used to test the readability of government manuals, orders, 
instructions, and so on. It can also be used to alert writers to 
revise their prose according to the reading level of the intended 
audience and to serve as a taking-off point for a college classroom 
discussion of auflience. An advantage of the readability program is 
that students can run it themselves. For those who use word 
processors, a readability program will probably end up as one more 
utility program— like a word counter or spelling checker — that 
quickly provides potentially useful information about one's prose. 
(HOD) 
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TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 

Measurinq Readability with a Computer 
What We Can Learn 
by 

Glenn Spiegel , US General Accountina Office 
John J. Campbell, Howard University 

What can computers tell us about writinq that is meaningful in 
human terms, that goes beyond merely counting and tabulating? Host 
of us looking at computer applications to writina seem to be up 
against a corollary of Murphy's law that states, "The more 
meaninaful the question we ask, the less likely we are to qet an 
answer from the computer." 

Readability measurement seems, at first glance to be an area 
whetG something a comouter does well — counting — can yield meaningful 
information about human activities — the response of readers to 
texts. And so we set out to do somethinq we thouqht was 
simple — write a orogram in BASIC that woul-i run on our 
microcomputers, count some obvious surface features of a text, do a 
little arithmetic, and calculate a commonly used index of 
readability, the Flesch index. This index, we felt, could tell a 
writer useful things about how easily his or her orose (or someone 
else's) will be understood. Readability scores, after all, are 
usefully applied to a range of things trom children's books to 
procedures for nuclear reactor safetv. 
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Of course there was no escapina Murphy's Law. Even "simole" 
surface features of a text make severe demands on a computer's 
capabilities, and a readability score turns out not to be quite as 
meaningful as one would hope. 
Measurinq Readability 

Readability is measured by reassuringly concrete methods. 
Readers are given a text to read and are then tested on how much 
they understand. Two popular testing methods are simply asking 
questions about the passages and the "cloze" procedure, in which 
people are given a text with words deleted and are asked to guess 
what the deleted words are.1 Both these methods test the actual 
response of readers to actual texts. 

Readability formulas are developed by finding numerical 
measures of surface features that correlate well with the scores 
obtained from actual readers. In a pioneering work, Grav and 
Leary2 rewrote standard texts in a variety of ways to test for the 
effects of 44 different style variables-such as se-tence length, 
whether verbs are active or passive, and number of pronouns-on 
readability. (They recognized that content, format, and 
organization also affected readability, but felt that only style 
could be quantified and tested objectively.) They found that 20 of 



Ij.R. Bormuth, "Readabilitv: A New Approach," Rpading Research 
Quarterly , vol. 1 (1966), pp. 79-132. 

2w.S. Grav and B.E. Leary, What Makes a Book Re adable? 
(Chicago, Univ. of Chicago Press, 1935). 
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their variables were significant, but that five accounted 

for most of the variance. Increasing the following made prose 

harder to read: 

• number of hard words used 

0 number of different words used 

• average sentence lenath 

• number of oreoosit ional phrases 

Increasing the following made prose easier to read: 

9 number of personal oronouns 
The first four of these are rouqh measures of diction and syntactic 
complexityr while the fifth seems to relate to reader interest. 

Rudolf Flesch, another pioneer, developed a simpler formula 
that takes into account vocabulary and syntax and thac correlates 
well with tests on readers. 2 Flesch's formula calls for counting 
the words and syllables in a sample of text. (He cautions that this 
does not mean that word and sentence length are the only 
determinants of readability, only that they can serve as 
ind i'^ators . ) 

The formula gives a numerical score, RE (for Reading Ease), 
determined as follows: 

RE = 206.835 - .846*(syllables/100 wds.) - 1.015*(av. wds./sent.) 

(Note the six-diqit orecision of the first coefficient; we'll return 
to this later. ) 



2w.S. Gray and B.E. Leary, What Makes a Book Readable? 
(Chicago, Univ. of Chicago Press, 1935). 
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Flesch sugqest-s interoretinq the results by the following 
table: 

0-30 Graduate or specialized knowledge required. 
30-50 Colleqe level 
50-60 Hiqh School level 
60-70 Eiqhth-qrade level 
70-80 Seventh-grade level 
80-100 Sixth-grade or lower level 

In the second column, the information that really counts, we now 
have one-digit orecision, and even that is a little questionable. 
We can put the number 8 on a certain grade level, but that really 
gives us only a probability that any given student will be able to 
read the material. How many eighth graders read at exactly the 
eighth-grade level? 

Another pooular formula of this type was developed by Robert 
Gunning, who used word and sentence length in a slightly different 
way to produce what he calls the "fog index. "4 it is slightlv less 
reliable than the Flesch score but has enjoyed considerable success 
among those who enjov flogging bureaucratic orose but don't 
necessarilv Icnow anything about writing. (A danger we will return 
to later is the danger of cutting numbers into the hands of people 
who don't understand the concents behind them.) 

Whatever the accuracy, though, a Flesch score tells us 
experimentally validated truth about something of genuine 
interest— how hard a text is to understand. Furthermore, the 



4Gunning, Robert, The Techniques of Cle ar Writinq (New York: 
McGraw-Hill, 1952). 
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formula is so simple that even an Enqlish teacher can write a 
program to calculate these values. Well, maybe. 
Writing the Progran: Ambiguity and Compromise 

The first programming problems came up in what should be the 
simplest task — counting the words. What does one do about numbers, 
acronyms, and abbreviations? Should an acronym be counted as a word 
for each letter? The first compromise was to count any number or 
acronym as one word, even though we know they impair readability. 

The next problems came from the ambiguity of written English 
surface structures. Sentences wouJi not seem to be hard to count; 
after all they are well marked by initial capitalization and 
terminal punctuation. But a computer needs to have every detail in 
place, and, after all, a period does not always mark a sentence end, 
and neither does a question mark* (It may be part of an embedded 
quotation.) A period followed by two spaCv3S does always mark a 
sentence end (at least in accurately typed text) but, if it ends a 
paragraph, it will be followed, not by two blank spaces, but by a 
carriage return, as would an abbreviation that happens to come at 
the end of a line. These problems can be solved (except maybe for 
sentences with embedded quotations) but some inaccuracy creeps in. 
The whole problem is much less trying though, if one keeos in mind 
that the answer will have only one significant digit. 

The worst problem is with syllables. English has no regular 
rules that will reliably divide words into svllables. Typesetters 
use complex computer algorithms coupled with tables of exceptions to 



determine word breaks, 5 and these have to be checked by human 
editors. On a mainframe, one could, of course, include in one's 
program a table of all the words the program might encounter with 
the number of syllables each has, but that's out of the guestion for 
a micro, and setting up the table was too boring even to 
contemolata. 

Of course people have trouble with syllables, too. Phoneticians 
don't aqree on what defines a syllable, nor do speakers of different 
dialects of English agree on how to pronounce words. Does "idea" 
have two syllables or three? It depends on who you ask. At an 
early stage, we considered getting graduate assistants to do some 
syllable counts for us to validate our program, but we discovered 
that they weren't very good at recognizing and counting syllables 
either. One of us found that, despite his Ph.D. in English, he was 
unable to get reproducible results countina syllables in a 200-word 
sample oassage. Fortunately, we were saved again by the one-digit 
precision of the Flesch score. If we count letters and divide by 
3.0, we get a value that will rarely vary more than about ten 
oercent from the counted number of syllables for the type of prose 
(government reports) we are mainlv working with. 

So, finally, having been rescued from the computer's need for 
exact specification by the imprecision of human life, we finally had 
a program that calculated Flesch scores. We were now ready to 



5see, for example Donald Knuth, Tex and Me tafonttNew Directions 
in Computer Typesetting (New York; 1979). 
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exDlore just what our results could be used for. 

What Do Readability Measures Tell Os About Actual Reading? 

AS we have seen, readability formulas take as their indicator 
variables two factors from the many involved in the complex act of 
readinq to comprehend ideas. Several other factors may have more 
influence on comprehension than the style characteristics used to 
arrive at a readability score. 

The act of readinq and comprehending ideas from the printed 
paqe requires the interaction of the reader and the text. 
Characteristics of both reader and text influence comprehension. 
Reader characteristics include purpose, motivation, interest, 
knowledqe of content, experience with the type of document, and 
ability to use an appropriate readinq strateqy. Text factors 
involve prose and non-prose influencers. Prose factors include 
vocabulary and syntax (the two factors Flesch's formula addresses) 
as well as concept load, concept density, and the organization of 
the ideas in the text. Non-prose factors include the reader's 
environment and the use of illustrations. 

A reader's purpose, motivation, and interest in comprehendinq 
the ideas in the document influence qreatly the amount of 
concentration devoted to comprehension. Consider the casual readinq 
of a novel for pleasure. The reader is not under any external 
pressure to prove that comprehension has taken place, thus, he/she 
has the flexibilitv of choosinq what to remember. Now suppose that 
the same novel is assiqned to two college students— a business 
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major, and an Enqlish major. Both would be under pressure to Drove 
comprehension. Perhaps the business major would view the readinq 
and comprehension in the larqer context of all the courses required 
for qraduation. On the other hand, the Enqlish major would have a 
higher degree of ourpose, motivation, and interest in attaininq hiqh 
comprehension. Both, of course, would wonder if it was going to be 
on the final, but the English major would be more motivated to get a 
high grade in an English course and would be likely to try to 
remember more of the text. Finally, consider what would happen to 
the students' motivation and retention if the final exam questions 
on the book were handed out when it was assigned. Research 
indicates that providing readers with questions before and durinq 
the readinq act (qiving the reader specific purposes and motivat on) 
improves comprehension . 6 Accord inq to one study, 7 a reader's 
knowledge of the content being read is the most important variable 
influencing comprehension. Readina about a familiar subject 
involves fittinq the ideas into a well-defined memory structure. 
The more information a reader has in his/her cognitive structure the 
more active the comprehension process. Familiar ideas are 



6see T. Andre, "Does Answering Higher-level Questions While 
Reading Facilitate Productive Learning? Review of Educational 
Research, 49 (1983), pp. 280-318, and E. Rothkopf and R. Bloom 
"Effects of Interpersonal Interaction on the Instructional Value ot 
Adjunct Questions in Learning from Written Material," Journal of 
Educational Psychology , vol. 61 (1970), pp. 417-422. 

7d. Ausubel, The Psychology of Meaningful V erbal Learning (New 
York: Grune and Stratton, 1963) 
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reinforced, new ideas are placed and connected to an existing 
knowledqe structure. However, the reader who has little or no 
knowledge base mus begin to develoo the cognitive structure while 
reading. In many cases, the reader is unaware of which ideas are 
most important and which are the supporting ideas. The reader who 
knows little about the subject is likely to focus on the word level 
of comprehension, rather than the ideas, in order to decode new 
terminology. 

Experience with the format of the material being read also 
affects comprehension. A college student may have considerable 
experience in reading and comprehending a textbook format, for 
example. This exoerience allows the student to predict the location 
of important ideas in the text by usinq previous knowledqe about how 
a typical textbook is organized. But the student might lack 
experience with another type of document— for example, work-related 
procedures or tax instructions — and would have to work to discover 
the orqanization of ideas before comorehend inq the structure of the 
ideas . 

Finally, the reader needs to use approoriate reading strategies 
to comprehend ideas. GoodmanS found that readers who scored low on 
a comprehension test were not aware of the soecific strategies used 
to comprehend wr;tten text. They reported a focus on decoding the 



Goodman, "A Linguistic Studv of Cues and Miscues in 
Readinq," Elementary English , Oct., 1965, dd. 639-643. 

- 9 - 
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individual words in the text, not in searching for the logic and 
structure of the ideas. Readers who scored high on the 
comprehension test reported using a variety of reading strategies to 
locate and process the ideas in the text. For example, they used 
skimming and scanning techniques to preview the material, they 
turned headings into questions in order to read for a specific 
purpose, and they reread to gain a clearer understanding and link 
ideas . 

The above discussion has focused on the variables associated 
with the audience for a document. No readability index will tell us 
anything about them, nor will any rewriting of the text do anything 
about them. We must still consider relevant prose and non-prose 
features of the text, however, and we can revise these to improve 
comprehension. 

Semantic features of a text also influence how well a reader 
comprehends. In particular, the complexity and density of ideas and 
how these ideas are organized influence comprehension. For example, 
Kintsch? and Kintsch and Van DijklO have found that the density or 
number of relationships among ideas in a passage significantly 
affects readers' ability to recall information. They tested readers 
on passages that differed in the number, complexity, and 



9w. Kintsch, The Representation of Meaning in Memory 
(New York: Wilev, 1974). 

lOw. Kintsch and T.A. Van Dijk, "Toward a Model of Text 
Comprehension and Production," Psychological Review , 85 
(1968), pp. 363-394. 
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organization of ideas included but that had similar jreadabil ity 
scores. Readers remembered more when the number of ideas /as 
reduced and they were presented in a hierarchical order. Fraseii 

among others, has reoorted better comprehension results when 
subjects were given passages arranged in a logical orde:r compared to 
passages where the same sentences were randomly ordered. 

Segmenting orose text into different content groups also 
improves comprehension scores. For example, Prase and Schwartz^^ 
took a standard paragraph text format and segmented the sentence 
components by different forms of indentation. The segmented text 
resulted in 14 to 78 percent faster responses to guest ions about the 
text. Thus, it appears that both segmentation and indentation 
influence comprehension. 

Non-prose factors also seem to influence comprehension. 
Research in human factors indicates that environmental factors such 
as heat and light, affect a reader's comprehension. Moreover, 
specific features in a document — e.g. size, legibility of print, 
color — influence not only what a reader interacts with but also how 
the interaction occurs and what is gained from it. 13 



**L. Prase, "Influence of Sentence Order and Amount of 
Higher-Level Text Processing Upon Reproductive and Productive 
Memory," American Educational Research Journal , vol. 5 (1976), 
pp. 307-319. 

12l. Prase and B. Schwartz, "Typographical Clues that 
Pacilitate Comprehension," Journal of Educational Psychology , 
vol. 71 , (1979) pp. 197-206. 

1 ^M.M. MacDonald-Ross , Language in Texts; The Design of 
Cu rricular Materials , in in L.S. Shulman, ed.. Review of Research in 
Education, vol. 6 (Peacock Publishers, Inc., 1978) 
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The use of araphic devices has been reported to aid 
comprehension. MacDonald-Ross reports on several studies indicating 
that graphic devices such as tables, graphs, charts, illustrations, 
color, margin widths, and highlighting techniques (headings, 
subheadings, italics, underlining, etc.) have improved 

comprehens ion . ^ ^ 

Thus, the reading/comprehension act involves many interrelated 
factors. These factors, related to the reader and the text, can 
have a significant affect on the comprehension of the ideas 
represented by the prose. Merely aoplying a readability formula, 
which looks at only oart of the language factor, will not ensure 
improved comprehension. 
Where are readability scores used? 

Children's books and school texts would seem to be the ideal 
field for readability measurements. For one thing, the difference 
among different reading levels at different ages is more universal 
and developmental than the differences in vocabulary and skills 
among adults, which may be determined by employment, reading, and so 
on. We can count on more homogeneity among children at a given 
stage of development than we can among adults. Indeed, most 
children's texts are tested for readability before oublication. 
(Testing may however be done using methods other than Flesch scores. 



l^M.M. MacDonald-Ross, "Graphics in Text," in L.S. Shulman, 
ed.. Review of Research in Education, vol. 5 (Peacock Publishers, 
Inc., 1978) 
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such as the Dale-Chall score, which checks to see how many words in 
the text come from beyond a standard vocabulary thought to be known 
by most fourth graders* 15) 

At the other end of the spectrum, the Department of Defense 
uses readability testinq extensively in preparinq manuals, orders, 
instructions, and so on. The average recruit, who mav have to 
maintain a state-of-the-art piece of electronic qear, reads at the 
fourth-qrade level. The military takes readability measurement so 
seriouslv that the Navy, for example, maintains a list of words they 
expect their ratings to know and continually rechecks the words on 
the list throuqh a testing program. The Naval Training Center, 
Orlando, PL, has developed a computer proqram that checks text for 
words not on this list and suqqests synonyms from the list. 16 

One of the authors of this article heloed orepare a reoort for 
the U.S. General Accountinc Office on the effectiveness of 
automobile recall letters. The report used readability measurements 
of actual recall letters to show that most of the people thev were 
addressed to would have serious difficulty in comprehendinq the 
message, since the letters were written at the graduate level or 
above and the averaqe readinq level of the American public is the 



^5e. Dale and J. Chall, "A Formula for Predicting Readability," 
Educational Research Bulletin , Jan. 21 and Feb. 17, 1948, pp. 11-20 
and 37-54. 

16j.p. Kincaid, J. A. Aagard , and J.W. O'Hare, Development and 
Test of a Computer Readability Editing System (CRES) , (Orlando, 
FLiTraininq Analysis and Evaluation Group, 1980), Technical Report 
No. 83. 

- 13 - 
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eighth grade. We validated this conclusion by informal testinq of 
the actual automobile return rate for the original letters (17%) and 
for a rewritten and reformatted version (84%). 
Caveats 

The General Accounting Office issues 500 to 700 lengthy reports 
a year and is concerned about the accessibility of its findings. It 
would seem to be an ideal place to introduce the use of readability 
formulas such as the Flesch score. We have been reluctant to do so, 
however, for reasons already alluded to. 

Many GAO staff members are accountants and social scientists by 
training; few have professional expertise in writing, but most feel 
safe dealing with numbers. For this very reason, we don't encourage 
them to use readability formulas. We feel this miqht lead to 
writinq by numbers. We especially worry that unskilled writers 
might assume the formula is a rewriting rule. (Although many 
authorities have warned aqainst using readability formulas as a 
guide for rewriting material, 17 the formulas are beinq used this 
way. ) 

Anyone can write short words and sentences. But the result is 
not necessarily qood. Short sentences make prose choppy. Short words 
may cause clumsv explanations of thinqs that have qood lonq names. 
This paraqraph is an example. 



17see, for example, G.R. Klare, Readability Standa rds for 
Army-wide Publications (Fort Harrison, Ind.: U.S. Army 
Administrative Center, 1979). 
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The ooint ws are makinq is that readability formulas may not 
even be useful in situations that seem most promising, because 
they don't say enough about prose that is meaningful. They are too 
simple and too liable to misinterpretation. No number can tell us 
much about prose by itself; it needs an expert to explain and apply 
it. It will not give an unskillful writer more expertise, it can 
only help a skillful writer to diagnose potential problems and 
fine-tune his prose. 
How Do We Ose Readability Ponnulas? 

One of us regularly runs our readability program on all his 
prose; he can call it up from inside his word processing software. 
Generally, he gets Flesch scores of around college level. If this 
is the case, he does nothing special; he is writing for people with 
college degrees. For one project, however, which called for writing 
instructions for secretaries on typing a new document format, he 
found that a Flesch score of college level triggered a rewrite. He 
spent most of his effort in reformatting the document, however, not 
in changing words and sentences. He changed the segmentation and 
indentation, caving some attention as well to vocabulary, but little 
to sentence length. Merely shortening the sentences in a list of 
instructions is unlikely to help the secretary at the word processor 
who is still looking for the instructions. 

Eor Can Readability Measurement Be Useful in the College Classroom? 

Certainly one wouldn't want to tell college students, any more 
than accountants, to make their already underdeveloped sentences any 
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shorter. In fact, we think the most valuable use of readability as 
a concept in the college classroom is as a takinq-off point for a 
discussion of audience, a qroup that is all too sadly missinq from 
most colleqe writing projects. A college student can saJely assume 
that his professor can and will get through the prose of his paper 
somehow, no matter how bad it is. After all, the professor is a 
skilled reader, and it is his or her job to read student papers and 
even write comments on them to prove they have been understood. 
Writing a paper for someone with less skill and less knowledge (Dare 
we say less interest?) might orove a very useful challenge, if 
prefaced by a discussion of what really influences readers' 
comprehension. The assignment itself might usefully call for the 
student to write instructions, perhaps for a word processor, and the 
instructions should actually be tested on other students. 

An advantage of a readability program is that students can run 
it themselves. The novelty of a new tov, coupled with the rapidity 
of the feedback may encourage students to look carefully at their 
writing (but beware of writing by the numbers). This would be most 
appropriate to upper-level students who have become sufficiently 
expert in some subject area to clog their prose with jargon. 

A readability score may also be helpful in that it cannot be 
dismissed by students as a teacher's personal preference. After 
all, no matter what grade last semester's teacher gave him or her, a 
student can hardly argue with a number generated by simple counting 
and multiplying. 
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For those students (and faculty) who use word processors, a 
readability program will probably end up as one more utility 
oroqram, like a word counter or spelling checker, that provides, 
quickly and easily, some potentially useful information about one's 
prose. Ultimately, it is not qoing to be a breakthrouqh into an 
area where computers can tell us something profound and meaninqful 
about writinq. What it may do is prompt us to apply our own 
knowledge. ^ ^ 



^^We will be happy to send you a listinq of our program 
(written in BASIC) or to transmit it to you electronically. Contact 
Glenn Spiegel, Writing Resources Branch, US GAO, 441 G St. NW, Room 
4528, Wash. DC, 20548. 
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